# 02207 : Advanced Digital Design Techniques

Exercise of Retiming

LAB 3

Group  $dt\theta$ 7

Markku Eerola (s053739)

Rajesh Bachani (s061332)

Josep Renard (s071158)

# Contents

| 1 | Introduction                                        | <b>2</b>      |  |  |
|---|-----------------------------------------------------|---------------|--|--|
|   | 1.1 Authors by Section                              | 2             |  |  |
| 2 | Retiming                                            | 2             |  |  |
| 3 | Original Design                                     | 2             |  |  |
|   | 3.1 Power dissipation and cell count                | 3             |  |  |
| 4 | Retimed Design 4.1 Power dissipation and cell count | <b>3</b><br>5 |  |  |
| 5 | Discussion                                          |               |  |  |
| 6 | Appendix A: Power reports from the original design  | 5             |  |  |
| 7 | Appendix B: Power reports from the retimed design   | 5             |  |  |

### 1 Introduction

This document is report of the third exercise on DTU course Advanced Digital Design. In this exercise we studied the concept of retiming using a digit recurrence division implementation with radix-4 and carry-save adder.

In course of the exercise we examined two designs, the original design for digit recurrence division and the retimed design. We first compiled, simulated and synthesized the original design to get the power report and cell counts for it. We then modified the VHDL code for the original design to retime the recurrence. This retimed design was then also compiled, simulated and synthesized to get the power report and cell counts.

In the following sections we will briefly explain the concept of retiming, present the original circuit and the retimed circuit and the power dissipated and cells used in both. In the last section we will discuss the results.

#### 1.1 Authors by Section

- Rajesh Bachani
- Josep Renard
- Markku Eerola

## 2 Retiming

Retiming is an optimizing technique where structural location of registers is manually moved without affecting the functionality of the circuit in order to improve its performance. This is done either by removing a register from each input to a block and adding a register to each output, or by adding registers to the inputs and removing registers from the outputs.

In our case the motivation for retiming was to create slack on a non-critical path, and to have the synthesizer substitute HS cells with LL cells on this path thus lowering the overall power dissipation in the whole circuit. According to the lecture slides the circuit we were studying should gain approximately 30% power savings from this kind of retiming.

## 3 Original Design

The original design upon which we aimed to improve with the retiming is presented in figure 1.

The Sel. function -block implements the quotient digit selection function. The selection function determines a 4-bit quotient digit using 3 most significant bits of the divisor d and 7 most significant bits from the results stored in registers Ws and Wc.

The MUX block selects the input for the divisor multiplication between the dividend, which is used only in the initialization phase of the division algorithm, and the result of the substraction of the quotient digit/divisor multiplication result from the dividend. The substraction result is stored in register Ws.

The Multiple gen. -block implements the divisor multiplication ie. it multiplies the divisor d with the 4-bit quotient digit. This block is basically a multiplexer.



Figure 1: Digit recurrence division

The Carry Save Adder -block implements the substraction of the result of the divisor multiplication from the dividend. The substraction is done with a carry-save adder as the name of the block suggests.

The registers Ws and Wc store the carry and the sum from the carry-save adder respectively.

The critical path of this circuit is marked with red arrows in the figure 1.

#### 3.1 Power dissipation and cell count

The Synopsys VSS Simulator was used to annotate the switching activity based on a testbench and test vectors. This switching activity was used by Design Vision to estimate the power dissipation within the circuit. The results can be seen in table 1 for each composing block. The actual report is in appendix A.

The table also shows the number of HVT and SVT cells in each composing block.

## 4 Retimed Design

The retimed design is presented in figure 2.

Since only the most significant bits of sum and carry are used in the quotient digit selection it makes sense to separate the most significant bits from the least significant bits to separate structural slices. This is done in the VHDL code by disconnecting the quotient digit selection block from the W registers thus removing registers from the inputs and adding new registers to the output of the block. This frees the W registers from the most significant slice. By also separating the implementation of the most significant bits of the adder and the multiplie

| TT 11 1 TD    | 1             |              | 1 • • • (     | TT7) 1 | 11 4      |
|---------------|---------------|--------------|---------------|--------|-----------|
| Table 1: Powe | r dissipation | i in origina | al circuit (u | wrand  | cen count |
|               |               | 0            | (             | ,      |           |

| Block      | P static | P dynamic | P total | SVT cells | HVT cells |
|------------|----------|-----------|---------|-----------|-----------|
| Control    | 0.8      | 35.7      | 36.5    | 21        | 24        |
| Mux        | 0.12     | 28.3      | 28.4    | 1         | 57        |
| Mult. gen. | 6.8      | 124.0     | 130.8   | 226       | 51        |
| CSA        | 7.4      | 196.4     | 203.8   | 141       | 35        |
| SEL        | 3.3      | 68.1      | 71.4    | 80        | 5         |
| Reg W      | 13.3     | 336.7     | 350.0   | 315       | 8         |



Figure 2: Digit recurrence division retimed

generation from the implementation for the less significant bits more of the design is freed from the critical path.

We achieved these changes by editing the top-level VHDL file for the original design. We disconnected the higher bits of register W from the selection function and introduced the register q as a new component. We connected the high bits of CSA directly to the selection function and connected the new register between the selection and the multiple generation.

These changes do not affect the functionality of the circuit, but by dividing the implementation to most significant and least significant slices we get two paths. The critical path is  $T(SEL)+T(reg\ q)+T(mux)+T(CSA)$ . The delay on the non-critical path is at maximum  $T(reg\ W)+T(mux)+T(CSA)$ . From this it can be seen that the non critical path has some slack which the synthesizer should be able to use to optimize the least significant slice for

power, namely by replacing HS cells with LL cells.

## 4.1 Power dissipation and cell count

Just as with the original design the Synopsys VSS Simulator was used to annotate the switching activity based on a testbench and test vectors. This switching activity was again used by Design Vision to estimate the power dissipation within the circuit. The results can be seen in table 2 for each composing block. The actual report is in appendix B.

The table also shows the number of HVT and SVT cells in each composing block.

Table 2: Power dissipation in retimed circuit (uW) and cell count

| Block      | P static | P dynamic | P total | SVT cells | HVT cells |
|------------|----------|-----------|---------|-----------|-----------|
| Control    | 0.9      | 34.9      | 35.8    | 21        | 22        |
| Mux        | 0.3      | 24.9      | 25.2    | 9         | 53        |
| Mult. gen. | 0.7      | 48.1      | 48.8    | 20        | 154       |
| CSA        | 1.6      | 116.3     | 117.9   | 26        | 150       |
| SEL        | 3.0      | 64.2      | 67.2    | 71        | 7         |
| Reg W      | 3.0      | 258.4     | 261.4   | 118       | 116       |
| Reg q      | 0.7      | 16.3      | 17.0    | 20        | 3         |

- 5 Discussion
- 6 Appendix A: Power reports from the original design
- 7 Appendix B: Power reports from the retimed design